A GUIDE TO CARLESON'S THEOREM 



CIPRIAN DEMETER 



Abstract. This paper is meant to be a gentle introduction to Carleson's Theorem on 
pointwise convergence of Fourier series. 



1. INTRODUCTION 

Let 

n 

Snf(x) = f(k)e 2mkx , 

k=—n 

be the partial Fourier series of the L l function / on [0, 1]. In 1966, Lennart Carleson has 
proved the following long standing conjecture. 

Theorem 1.1 ([5]). For each f E L 2 ([0, 1]), the Fourier series S n f converge almost 
everywhere to f. 

Soon after that, a slight modification of Carleson's method allowed Hunt [11] to extend 
the result to LP functions, for p > 1. 

Theorem 1.1 has since received many proofs, most notably by Fefferman [10] and by 
Lacey and Thiele [13]. The impact of Carleson's Theorem has increased in recent years 
thanks to its connections with Scattering Theory [19], Egodic Theory [7], [8], the theory 
of directional singular integrals in the plane [15], [16], [6], [9], [1], [2] and the theory 
of operators with quadratic modulations [17], [18]. A more detailed description can be 
found in [12]. These connections have motivated the discovery of various new arguments 
for Theorem 1.1. While these arguments share some similarities, each of them has a 
distinct personality. Along these lines, it is interesting to note that for almost every 
specific application of Carleson's Theorem in the aforementioned fields, only one of the 
arguments will do the job. 

All the arguments for Theorem 1.1 are technical. To present the main ideas in a 
transparent way, we will instead analyze the closely related Walsh-Fourier series, which 
we recall below. 

For n > the n— th Walsh function w n is defined recursively by the formula 

Wo = 1[ ,1) 

w 2n = w n (2x) +w n (2x-l) 
w 2n +i = w n (2x) - w n (2x - 1). 
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Given / : [0, 1] — > C we recall the partial Walsh- Fourier series of / 

n 

S^f(x) = Y,(f,w k )w k (x). 

k=0 

The following theorem was proved by Billard, by adapting Carleson's methods. 

Theorem 1.2 ([4]). For each 1 < p < oo and each f G L p ([0, 1]), the series S^f(x) 
converges almost everywhere to f(x). 

We present a few proofs of Theorem 1.2 which are translations of their Fourier ana- 
logues. In each case the translation can be done in more than one way, the proofs presented 
reflect author's taste. While a few of the features of the original proofs from the Fourier 
case will be lost in translation, the main line of thought will be preserved essentially intact 
in the Walsh case. Very little originality is claimed by the author. 

This paper is by no means a complete guide to Carleson's Theorem, in particular we 
shall make no attempt to describe in detail any of its afore mentioned applications. The 
main goal is to give a self contained but concise survey of some of the main arguments in 
the literature. 

Acknowledgements. The author would like to thank Christoph Thiele for clarifying 
discussions on the argument from [18]. Part of the material in this paper was organized 
while teaching a class on Harmonic Analysis. The author is grateful to his students 
Francesco Di Plinio and Prabath Silva for motivating him and to his collaborators for 
enriching his understanding of time-frequency analysis. 

2. The Walsh phase plane 

It turns out that there is a multiscale description for S^f. Let T> + denote the collection 
of all dyadic intervals which are subsets of R+ = [0, oo). We call R+ x R + the Walsh phase 
plane. 

Definition 2.1. A tile p = I p x oo p is a rectangle of area one, such that I p ,u p G T> + . A 
bitile P = Ip x up is a rectangle of area two, such that Ip,u>p G V + . Let ui Po ui Pu be the 
left (or lower) and right (or upper) halves of ujp. We will denote by P t — I P x oj Pi and 
Pu = Ip x wp„ the lower and upper tiles of P. We denote by P a u the collection of all 
bitiles. 

Given a tile p = [2 j m,2 j (m + 1)] x [2~ j n, 2~ j (n + 1)] we define the associated Walsh 
wave packet 

W p (x) = 2~ j/2 w n (2- j x - m). 

To understand the relevance of the Walsh phase plane, we recall a few tools from [22], 
see also [21]. 

Every x G K+ = [0, oo) can be identified uniquely with a doubly-infinite set of binary 
digits a n = a n (x) such that 

X = J2*n2 n 
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where a n G {0, 1} and \immf n ^ ) ._ 00 a n = 0. Note that a n is eventually zero as n — > oo. 
We define two operations on R + . First, 

x®y:= ^b n 2 n 

n 

where 

b n := a n (x) + a n (y) mod 2. 
We caution the reader that b n is not always the same as a n (x (By), as the example 

X = En^nodd 2 ™' V = E„< : „ even 2 " sll0WS - Als °> sillCe I = X ® V = X ® Z where 

z — 1 + En<0Tiodd 2™, ©) is not, strictly speaking a group. We will be content with 
observing that for all practical purposes ®) can be thought of as being a group, in 
the sense that ® behaves like a genuine group operation if we exclude pairs x, y of zero 
product Lebesgue measure. 

Define the second operation by 

x®y : = J^c n 2 n 

n 

where 

c n := ^2 a m {x)a n -rn{y) mod 2. 

We note that this sum is always finite. If we neglect zero measure sets, ®, ®) can 
be thought of as being a field with characteristic two. It will be implicitly assumed that 
various equalities to follow hold outside zero measure sets. 

Define the function ew '■ — > {— 1, 1} such that ew(%) = 1 when a_i(x) = and 
ew{x) = — 1 when a_i(x) = 1. This 1-periodic function is the Walsh analogue of e 2mx . It 
is easy to check that 

w n {x) = e w (x ®n)l [0il] (x), 

thus w n can be thought of as being the Walsh analogue of e 2nmx . Also, for each tile 
p = I p x Up we have 

1 x — 1(1 ) 

w p( x ) = TTTm w ^ — rrr-) e w{x® *( w p))> 
I pI I pi 

where /(J) denotes the left endpoint of J. A simple computation shows that for each 
bitile P 

W Pl (x) = W Pu (x)e Ip (x) (1) 

where ej(x) — 1 on 1\ and ei(x) — — 1 on I r . 

The collection of all wave packets W p where p ranges over all tiles with fixed scale forms 
a complete orthonormal system in L 2 (R + ) and thus 

p: 1^1=23 

We introduce the Walsh (also called Walsh-Fourier) transform of a function / : R + — > C 
to be 



J r wf(0 = f(0--= J e{x®i)f{x)dx. 



It is easy to see that its inverse coincides with F w . 
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Arguably the most important feature that makes the Walsh phase plane technically 
simpler than its Fourier counterpart is the absence of the strong form of the "Uncertainty 
Principle". This allows the existence of functions compactly supported in both time 
and frequency The best example is l[o,i], which equals its Walsh transform. A quick 
computation shows that for each interval I G V + 

0(0 = |/|l[o,m-i](Oe($®a; 7 ) (2) 
where Xj is an arbitrary element of I. Similarly 

1 , £-l(u > p ) 



w p(0 = nv^M | i )M£ ® Kip))- 



Thus W p is spatially supported in I p while its Walsh transform is supported in ou p . An 
application of Plancherel's theorem shows that 

(Wp,WV) = (3) 

whenever the tiles p and p' do not intersect. 

The following partial relation of order was introduced by C. Fefferman [10]. 

Definition 2.2 (Order). For two tiles or bitiles P,P' we write P < P' if I P C Ip> and 

oupi C Up 

Note that P and P' are comparable under < if and only if they intersect as sets. We 
will refer to maximal (or minimal) tiles (or bitiles) with respect to < as simply being 
maximal (or minimal). 

Definition 2.3 (Convexity). A collectionP of bitiles is called convex if whenever P, P" G 
P, P' G P aU and P <P' < P", we must also have P' e P. 

For a collection p of tiles or bitiles we denote by A(p) = U P g P h x °°p ^ ne region in M? + 
covered by them. 

We will use the fact (see Lemma 2.5 in [22]) that for each convex set of bitiles P, the 
region A(P) can be written (not necessarily in a unique way) as a disjoint union of tiles 
P 

A(P) = A(p) (4) 

We can identify any region in the Walsh phase plane which is a finite union of pairwise 
disjoint tiles p G p with the subspace of L 2 (M + ) spanned by (W p ) pep . Indeed, it turns out 
that if two such collections p and p' of tiles cover the same area in the phase plane, then 
(W p ) pep and (Wp)p €p i span the same vector space in L 2 (IR + ), see Corollary 2.7 in [22]). 
In particular 

52(f,W p )W p (x) = 52(f,W p )W p (x). (5) 
pep pep' 

The projection operator onto this subspace 

n p f(x) = J2(f,Wp)w p (x) 

pep 
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will be referred to as the phase space projection onto p. (3) guarantees that (W p ) pep forms 
an orthonormal basis of the range of Up. In particular, if P is a convex union of bitiles 
and p satisfies (4), then we abuse notation and define 

n P / := n p /. 

An easy induction argument proves 

Lemma 2.4 (Corollary 2.4, [22]). Let p, p' be a finite collections of pairwise disjoint tiles 
such that A(p') C A(p). Then there exists a collection p" of pairwise disjoint tiles which 
includes all the tiles in p' such that A(p) = A(p"). 

Lemma 2.4 and (3) will imply that for each convex collection P of bitiles and for each 

PeP 

(f, W p ) = (lip/, W p ),pe {P u , Pi}. (6) 

Remark 2.5. We mention that the relation of order as well as concepts such as convexity 
and phase space projections can be extended naturally to the two (or higher) dimensional 
case. This will be explored in Section 8. 

Fix now n > 0. Note that S^f(x) = U Pn f(x), where p n is the collection of the tiles 
[0, 1] x [k, k + 1], < k < n. We will partition A(p n ) = [0, 1] x [0,ra + 1] in a different 
way. Namely, for each point (x,£) G [0, 1] x [0,n+ 1], there exists a unique bitile P such 
that (x,£) G Pi and (x,n + 1) G P u . This bitile is precisely the minimal one such that 
ri + 1, £ £ up and x G Ip. Note that the tiles Pi corresponding to all these P will partition 
[0, 1] x [0,n + 1]. But then (5) will imply that 

S?f(x)= (f,W Pl )W Pl (x), 
PeP a ii--n+iewp u 

in particular 

sup|5^/(x)| = I J2 (f,W Pl )W Pl (x)l UPu (N(x))\ 

for a suitable function iV : R + — > N. The roles of P u and Pi can be interchanged, without 
altering the nature or the difficulty For pedagogical reasons we choose to work with the 
model sums 

Cp/(x) = Y,(f\Wp u )Wp u (x)l^(N(x)), 

PGP 

where P C P a u and / : R + — > C. Using the standard approximation argument combined 
with the almost everywhere convergence of S^f(x) for characteristic functions of intervals, 
Theorem 1.2 will follow from the following inequality. 

Theorem 2.6. We have for each 1 < p < oo, N : R + ->• R + and f G L P (R+) 

I|Cp oK /IIp < c P \\f\\ P . 

The constant C p does not depend on f and N. 

Note that to recover Theorem 1.2 we could restrict attention to functions / on [0, 1] 
and to the bitiles spatially supported in [0,1]. We will do so in some, but not all the 
proofs to follow. We will always allow the choice function iV to take any value in R + , not 
just integers. 
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3. Estimates for a single tree 

Throughout the paper we will denote by Mf the Hardy-Littlewood maximal function 
of /. The various implicit constants hidden in the notation < will typically be allowed to 
depend on the Holder exponents p,Pi, Sj etc. 

Fefferman [10] organized the bitiles in structures that he named trees and forests. The 
restriction Ct to a tree will be a Calderon-Zygmund object which can be investigated 
with classical methods. The contribution of forests is controlled by using various forms 
of orthogonality between the tree operators Ct- All the approaches described in the 
following sections will rely on this strategy. 

Definition 3.1. Let I T G V + and £ T G R+ \ {n2~ fe : n, k G Z}. A tree T with top data 
(It,£t) is o- collection of bitiles such that Ip C It and £t £ wp for each P G T. If 
Pt G P a ii is such that P < P T for each P G T , we call Pt a top bitile for T. Note that 
such a Pt is not unique and T need not contain a top bitile. 

A tree is called overlapping if the tiles {P u : P G T} intersect. A tree is called lacunary 
if the tiles {Pi : P G T} intersect. 

Each tree can be decomposed as T = T] U T Q where 

T^iPeT-.frecop} 

T = {PeT:frC u P J 

Note that T\ is lacunary while T Q is overlapping. Moreover, if T is convex then so are the 
trees T\ and T . These observations will allow us to always assume the tree we deal with 
is either overlapping or lacunary. 

The classical example of lacunary tree is the Littlewood-Paley tree consisting of all 
bitiles of the form P 1 := I x [0, I G V + . Note that for each P 1 in the Littlewood- 

Paley tree we have Wpi(x) = hj(x), where hi is the L 2 normalized Haar function equal to 
|/| -1 / 2 on the left half I\ and to — 1/| -1 / 2 on the right half /„. Recall that if X is a subset 
of the Littlewood-Paley tree we have 

HZtf'^MpSII/IU Kp<oo. (7) 
p'ex 

For a lacunary tree T we denote 

T f(x) = J2(fiW Pu )Wp u (x). 
Per 

We have the following generalization of (7). 

Lemma 3.2 (Single tree estimate: singular integral). Let T be a lacunary tree. Then for 
each 1 < p < oo 

I|Ot/|| p < ll/llp. 

Proof 

Recall that Wp u is supported in up u . Call f2 the collection of all intervals up u , P G T 
and note that they are pairwise disjoint and sit within distance smaller than their length 
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from £ T . By the Walsh version of the Littlewood-Paley Theorem applied to T f and then 
to / we have 

iio r /iip<ii(Ei-^ 1 ( 1 ^(°r/))i 2 ) 1/2 iip = 
= i E (/. w^>w^i) 1/2 Hp = ii(E l{f, uf 2 h P ) 1/2 \\ P < 11/11,, 

wen Per-. \u Pu \=\u\ PeT ' p ' 

■ 

The next lemma shows that the the operator Cp restricted to an tree T is a maximal 
function, if T is overlapping and a maximal truncation of a discrete Calderon-Zygmund 
operator, if T is lacunary. 

Lemma 3.3 (Single tree estimate: maximal truncations). Let T be a tree. Then for each 
1 < p < oo 

II J2^ W ^)WpA^ p SN(x))\\ Lp{r+) < 
PeT 



Proof It suffices to prove the lemma when T is either lacunary or overlapping. We start 
with the lacunary case. Note that for each x there is k = k(x) such that 

E^^^ww,^))^ E u,w Pu )w Pu {x). 

PeT Per 

\u p \>2k 

Note that if P,P'<eT and \up\ > \uipi\ then up u is disjoint from and sits at the right of 
ojp> u . Thus there will exist an interval u = u>(x) which contains all uip u with \oj p \ > 2 k and 
which will have empty intersection with all up u satisfying \up\ < 2 k . We can thus write 

E (f,W Pu )W Pu (x) = T^[l w F w [0 T f]]{x). 

per 

\u p \>2 k 

Using (2) and the fact that x@ [0, is an interval of length | | 1 containing x we get 

I E (f>WPu)W Pu (x)\ < \u\ [ \0 T f(y)\dy<M(0 T f)(x). 



per 



It now suffices to apply Lemma 3.2. 

Assume next that T is overlapping. This case is immediate by noting that for each x 

\Y,(fi W PuWpM^ Pl {N{x))\ = \{f,W Pu )W Pu (x)\<M(f)(x) : 
PeT 

where P is the unique (possibly nonexisting) bitile with (x,N(x)) G Pi. ■ 

We remark that (6) implies that whenever T is convex and P G T we have (/, Wp u ) = 
(Il r /, W Pu ). Thus the result of Lemma 3.3 can also be written in a localized form 

l|Cr/||p < ||n T /|| P (8) 

We close this section with proving LP estimates for the phase space projection associated 
with a tree 
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Proposition 3.4. For each convex tree T and each 1 < p < oo 

Pr/ll, < \\f\\p- 

Proof We first observe that for each x G It, Hrfix) — ^pf( x ) where P is the minimal 
bitile in T with x G Ip. Indeed, according to Lemma 2.4, there exists a collection p" of 
pairwise disjoint tiles which includes P u and Pi such that 

n T /(x) = u P f(x) + KpfW- 

pep"\{P u ,Pi} 

But P is minimal, hence I p D Ip — for each p G p" \ {P u , P;}. 
Finally, note that for each bitile P and each x E Ip we have 

|n P /(x)| < |/p|- 1/2 (|(/,w P j| + \(f,W Pl )\) < A / |/| < M/(x). 



4. Size and pointwise estimates outside exceptional sets 

We are now ready to see the first proof of Theorem 2.6. This argument bears some 
resemblance to the original argument of Carleson [5]. A form of this argument has been 
used in [19] in the Walsh case, while the proof of the Fourier case is hidden in [7]. This 
type of argument proved instrumental in applications to the Return Times Theorem [7] 
and the directional Hilbert transform in the plane [9]. 

The main tool is the size of a collection of bitiles, a concept introduced by Lacey and 
Thiele [14] in the Fourier case and by Thiele 1 [22] in the Walsh case. 

Definition 4.1. The size of a collection P of bitiles with respect to a function f : R + — > C 
is defined as 

size/CP) = sup . . 

PGP \ 1 P\ 1 

The next two propositions record some of the key features of size/(P). 

Proposition 4.2. Let T be a convex tree. Then for each 1 < p < oo 

\\n T f\\ P <size f (T)\I T \^ (9) 

Proof It suffices to prove that IlLTr/Hoo < size/(T). The proof of Proposition 3.4 shows 
that for each x G It, Hrf{x) = Hpf(x) where P is the minimal bitile in T with x G Ip. 
To close the argument, observe that for each bitile P we have 

\\npf\\cc<\ip\- 1/2 (\(f,w Pu )\ + \(f,w Pl )\) 
< \ip\-^V2(\(f,Wp u )\ 2 + \(f,w Pl )\ 2 y/ 2 - |/p|- 1/2 v^l|n P /|| 2 



Proposition 4.3. For each P and f : R + — > C, 

size/(P) < sup inf M(f)(x) 
P^pxeip 



1 it is there referred to as density 
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Proof This is immediate, since for each P 

l|np/|| 2 = (|(/,^pJ| 2 + |(/,^p ; )| 2 ) 1 / 2 <^^ |/|. 



Definition 4.4. A forest J 1 is a finite collection of bitiles which consists of a disjoint 
union of convex trees. 

We note that each finite convex collection P of bitiles can be turned into a forest, 
possibly in more than one way. Indeed, start with a maximal element P from P, and 
construct the maximal tree in P with top bitile P. Remove this tree T from P and note 
that P \ T remains convex. Repeat the procedure with P \ T replacing P, to select the 
next tree. Iterate this until all the bitiles from the original P are selected. 

We will sometimes use the notation Njr for the counting function of a forest J 7 

Nr(x) = Y,li T (x). 
tgp 

A key idea in many of the approaches to Carleson's Theorem is to split the bitiles into 
forests with a certain size. 

Lemma 4.5. Let P be a finite convex collection of bitiles and let f : R + — > C. Then 
P = U Pz , such that 

■ both P lo and P hi are convex 

■ size f (Pio) < |size / (P): 

• Phi is a convex forest with trees T G T satisfying 

^|/ T |<siz e/ (P)- 2 ||/|| 2 . 

tgp 

Proof This is a recursive procedure. Set P stock '■= P and T = 0. Select a maximal bitile 
t e P stock such that 

w\wv» > ^ 

Define 

T(t) = {PeP stock 

:P<t}. 

and note that since P stock is convex, both P S tocfc \ T(t) and the tree T(t) will be convex. 
Add T(t) to the family T . Reset P stock '■= P stock \ T(t), and restart the procedure. 

The algorithm is over when there is no t to be selected. Then define Phi to consist of 
the union of all bitiles in all the trees from J 7 , and let P; G = P \ Phi- 

The first two needed properties as well as the convexity of T are quite immediate. By 
maximality the selected bitiles t are pairwise disjoint, and thus the functions II t / are 
pairwise orthogonal, thanks to (3). It follows that 

E IJH = E l J 'l ^ 4siz e/ (P)- 2 E ll 11 ^^ < 4size / (P)- 2 ||/|| 2 ,. 

TeT t t 



We can iterate the lemma to obtain 
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Proposition 4.6 (Size decomposition). Let P be a finite convex collection of bitiles. Then 

P = [J P«U Pnull, 

2-™<sizc / (P) 

such that 

• siz e/ (P n ) < 2~ n : 

■ P n is a convex forest with trees T G T n satisfying 

E l J T|<2 2 "||/||i (10) 

■ Hp/ = for each P G P nuU 

We are now ready for the main line of the argument. 
Proof [of Theorem 2.6] By using restricted type interpolation and a limiting argument, 
it suffices to prove 

\{* \Cpf(x)\ > \}\ < ±\E\ (11) 

for each E C R + with finite measure, each |/| < lg, each finite convex P C P a ;/, for each 
A>l,4<p<oo and also for each < A < 1, 1 < p < oo. The implicit constant in the 
inequality (11) will only depend on p. 

We start with the simpler case A > 1, 4 < p < oo. Note that the bound size/(P) < 1 
follows from Proposition 4.3. Let P„, T n with 2~™ < 1, be the collections from Proposition 
4.6. For each T G T n with top bitile Pt G T, define the saturation T* = {P G P„ : P < 
Pt}- Note that the trees T* remain convex, but in general they are not pairwise disjoint. 
Call J 7 * the collection of the trees T*, and note that It* = It- 

Define the exceptional set 

F= |J |J {x:\C T *f(x)\>\2- n / 2 }. 

2-"<iT*e^ 
Note that by (8) and (9) we have 

|{x:|^/( a ;)|>A2-™/ 2 }|<|/ T |(A2"/ 2 )^ 

Combining this with (10) and p > 4, we obtain |F| < -^\E\. 

Thus it remains to prove that \Cpf(x)\ < A on F c . The crucial observation behind 
this approach to Theorem 2.6 is that for each n the contribution to each x comes from a 
single tree T* G J*. Indeed, note that the contributing bitiles P G P n satisfy (x, N(x)) G 
I P x up r Since all these bitiles P contain (x,N(x)), they will be pairwise comparable 
under <. Call P x the unique maximal bitile among them. Let T* be one of the trees in 
J 7 * containing P x . It follows that all the contributing bitiles belong to T*. Thus, if x £ F 

\C P J(x)\ = \C T *f(x)\<X2- n / 2 . 

It further follows by linearity that 

|C P /(aO|< E |Cp^)I^ A - 

2""<1 

The case < A < 1 is very similar but we need an additional exceptional set 

G:={x:M(f)(x)>X p }. 
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Since \G\ < -^\E\ and since W Pl is supported on I P , it is enough to prove (11) with P 
restricted to those bitiles such that P </i G. Another application of Proposition 4.3 shows 
that size/(P) < A p . Let P n , T n with 2~ n < A p , be the collections from Proposition 4.6 
corresponding to our new P. Define 

F= |J |J {x:\C T *f{x)\>\ 1 ' 2 2-%}. 

2 -n<\ P T*&T* 

As before (8) and (9) imply 

\{x : \C T ,f(x)\ > X^2~m < |/ r |(^_)P/p-i. 

A 1 / 2 2 2 p 

We immediately get that |F| < \E\ < j^\E\. Note also that if x ^ F 
\C F f(x)\< \Ci>J(x)\< A 1/2 2"^<A. 

2~ n <\P 2- n <\P 



5. Mass and Fefferman's argument 

The argument in this section is a translation to the Walsh case of Fefferman's proof 
[10]. The key tool used in this proof is mass. 

Definition 5.1. The mass of a convex collection P of bitiles is defined as 

\E(P)\ 

mass(P) = sup 

pgp I-TpI 

where E(P) = I P n N~\ujp). 

In some sense the mass of a single bitile P measures (or better said, it puts an upper 
bound on- since uj Pi C Up) how much P contributes to Cpf(x). Indeed, it suffices to 
note that for 1 < p < oo we have HCp/Hp < mass(P) 1 / p ||/|| p . We will next extend this 
inequality to the case of trees and then to a special type of forests. 

We have the following analogue of Lemma 4.5. We will restrict attention to the bitiles 
spatially supported in [0, 1] 

P [0 ,i] := {P G Pa,, : I P C [0, 1]}. 

Lemma 5.2. Let P be a finite convex collection of bitiles in P[o,i]- Then P = P^ U P^, 
such that 

■ both P lo and P^ are convex 

■ mass(P io ) < |mass(P): 

• is a convex forest with trees T e T satisfying 



^2\I T \< mass(P) 



-i 



Proof This is another recursive procedure. Set P s tock '■= P- Select a maximal bitile 
t G Pstock such that 

, . mass(P) 
mass(t) > 
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Define 

T{t) = {Pe ± stock ■ 

p<t}. 

and note that since P stock is convex, both P stock \ T(t) and the tree T(t) will be convex. 
Add T(t) to the family T . Reset the new value P S i OC fc : = P stock, \ T(t), and restart the 
procedure. 

The algorithm is over when there is no t to be selected. Then define P hi to consist of 
the union of all bitiles in all the trees from J 7 , and let P io = P \ P hi . 

The first two needed properties as well as the convexity of T are quite immediate. By 
maximality the selected bitiles t are pairwise disjoint, and thus the sets E(t) will also be 
pairwise disjoint. Since E(t) C [0, 1], it follows that 

|/t| = I 1 *' - 2mass ( P ) _1 1^(^)1 ^ 2mass(P)- 1 . 

TeT t t 

■ 

Note that the mass of any collection is trivially bounded by I. We can iterate the 
lemma to obtain the following analogue of Proposition 4.6 

Proposition 5.3 (Mass decomposition). Let P C P[o,i] be a finite convex collection of 
bitiles. Then 

P = |JPnU Pnull, 

such that 

■ mass(P„) < 2- n : 

■ P n is a convex forest with trees T G T n satisfying 

• Cpf = for each P G P n uii and each f 

It turns out that we have not only L 1 but also dyadic BMOa control for Njr n . Recall 
that 

1 



/(*) - Jp I f 



dx 



II /II BMOa = SU P |7| / 
/ dyadic \i \ J I 

Proposition 5.4. We have 

\\NfJ\bmo a ^ 2™. 
In particular, for each I G T> + and A > 

\{x: £ l/T^)>CA2"}|<e- A |/|, 
Te^-.iTCi 

where C is large enough. 

Proof The first inequality follows since one can easily check that 



\N Tn \\ B MO A < sup 



\I\ 



The second one is just a consequence of John-Nirenberg's inequality. 
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Given a convex tree T with top data I T ) , define J T to be the collection of all 
maximal dyadic intervals J C It such that J contains no Ip with P G T. For J G J T 
define 

Gj = Jn( |J £(P)). 

PGT:JC/p 

We first observe that the intervals of Jt form a partition of I T . Note also that if J 
intersects some Ip with P G T, then the convexity of T forces the dyadic parent of each 
J G J T to equal ip(j), for some P(J) G T. Thus Gj C E(P(J)), which implies the 
following crucial Carleson measure type estimate 

\Gj\ < 2mass(T)| J|, (12) 

for each J G J. We also need to observe that Crf is supported on Ujej T Gj- 

Proposition 5.5 (Tree estimate). Let T be a convex tree. Then for each 1 < p < oo 

\\C T f\\ p < (massCT)) 1 /?!!/^. 

Proof Assume first that T is overlapping. Note that for each J 6 J T and i G J, if 
C T f(x) is nonzero then |C T /(x)| = jT^i^K/, ^p„)| f° r some P G T with J C /p. Thus 
|Cr/(a:)| < m£ yeJ M(f)(y). We conclude that 

j \C T f(x)\vdx= [ \C T f(x)\ p dx< \Gj\M[M(f)(y)r 

<2mass(T)^ f \Mf{x)\*dx = 2mass(T) ||M/||^ (/t) < mass(T) ||/||^ 
JeJ T 3 

On the other hand, if T is lacunary we reason like in the proof of Lemma 3.3 to write 

C T f(x)=J^ 1 {l u T w {0 T f)){x), 

where the interval u is any interval which contains all up u with x G Ip. Recall however 
that \Ip\ > 2\J\ for all such P and hence, since £t G wp ( , all o;p u will be contained in the 
interval [£t,£t+ Thus 

\C T f(x)\ < 1- f \0 T f\(y)dy < MM(0 T f)(y). 

\ J \ Jx®[o,\j\] y £j 

A repetition of the argument from the overlapping case combined with Lemma 3.2 ends 
the proof. ■ 

The proof of this proposition shows that for each J G Jt 

||Ct/||loc ( j) < m£M(V T f)(x) (13) 

where V T f = U. T f if T is overlapping and V T f = T (n T /) is T is lacunary. 

Definition 5.6. We call T a Fefferman forest if two bitiles from any two distinct trees 
in T are pairwise disjoint. 

It turns out that outside a small exceptional set, each forest T can be written as a 
disjoint union of a small number of Fefferman forests. 
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Lemma 5.7 (The Fefferman trick). Assume P is a convex set in P[o,i], which is organized 
as a forest T . For each K > 1 there is an exceptional set F with \F\ < e~ K such that 

BMO A 

(14) 

TeF:I T <£F 

• the bitiles in P F := \J TeJ r. lT ^ F {P : P e T} can be partitioned into 0(\og(K\\Njr\\ BMOA )) 
Fefferman forests 

Proof Define F = {x : J2 Te jrli T (x) > CK\\Njt\\ BMOa }, for large enough C. Note that 
\F\ < e~ K by John-Nirenberg's inequality. Call Tp := {T 6 J : J T F}. Then (14) 
is immediate. Let Pp be any top bitile for T. Define for each I G N with 1 < 2 l < 
CK\\Np\\ BMOA 

P l F := {P G P F : 2 l < #{T G T F : P < P T } < 2 l+1 }. 

It remains to prove that P^ is a Fefferman forest. Note that P l F is convex, in part because 
P is convex. For each maximal element t G P^, let T(t) = {P G P^ : P < t}. Obviously 
each tree T(t) is convex. Assume for contradiction P < P' for some P G T(t), P' G T(t') 
with t 7^ t'. Then P < t', in addition to P < t. Thus It intersects If. But then, since t and 
t' are pairwise incomparable under <, the sets {T G T F '■ t < Pt} and {T G T F : i' < Py} 
must be pairwise disjoint. Note that 

{T G T F : t < P T } U {T G T f : t' < P T } C {T G J> : P < P T }, 

and this will force the contradiction 2 l + 2 l < 2 l+1 . ■ 

We will repeatedly use the fact that if T, T' are trees in a Fefferman forest then Cpf 
and Cp'f are disjointly supported while Hrf and 11^// are orthogonal. 

Proposition 5.8 (Forest estimate). Let J 7 be a Fefferman forest. Then for each 1 < p < 
oo there exists 5(p) < ^ such that 

ll^/H^Hass^)) 1 /^!!^^)^)!!/^. 



When p > 2 we can take 5(p) = 0. 
Proof 

Let first p > 2. Consider the vector valued operator V f = (Upf)p(zp. Proposition 3.4 
shows that ||^||l°o^;°o(l°°) ^ 1- On the other hand, the pairwise orthogonality of Upf 
implies the bound || V\\l-^P(l 2 ) ^ 1- Interpolation [3] now gives 

\\V\\ L p^ip(lp) < 1- (15) 

Using Proposition 5.5 we get 

IICf/HS = E II^/IIp £ E mass(T)||n T /||^ < mass(^) 

Assume next that p < 2. Split J 7 into ||A^||oo forests so that for each k and each 
T, T" G we have Ip fl It' = 0- Note that we have as before 



= E E H^/IB ^ E E mass(T)||/l /T | 
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< mass(^) 11/11? = mass(.F)||A^||oo||/||£. 

k 

To close the argument, interpolate the L p and L 2 bounds for the operator / h-> Cjr/. ■ 

We are now ready to prove the following variant of Theorem 2.6. Note that this weaker 
version is enough to prove Theorem 1.2, via the standard approximation argument. Then, 
in the case 1 < p < 2 one can actually use Stein's Continuity Principle [20] to show that 
Theorem 1.2 implies the stronger Theorem 2.6. The author is not aware of any substitute 
argument which proves the same implication for p > 2. However, in Section 7 we present 
a recent delicate refinement of Fefferman's argument due to Lie [18] which closes this gap. 

Theorem 5.9. For each 1 < q < p < oo we have 

IICWII* £ 11/11, 

Proof By invoking a limiting argument, it suffices to prove the estimate with P[o,i] 
replaced by a finite convex collection P C P[o,i], as long as the implicit constant in the 
inequality is independent of P. Apply Proposition 5.3 to P to get P n , T n . Apply Lemma 

5.7 to each T n with K = K n := (n+l)L, L > 2 to be determined later. We get exceptional 
sets \F n \ < e ~ L{n+l) and the partitions of T Fn<n := {T e T n : I T <£. F n } into 0(n + logL) 
Fefferman forests J 7 ^^- Define F = U n F n and note that \F\ < e _L . Let also P* be the 
bitiles in the trees from \J n J : F n ,n- Note that Cj>f(x) = Cp*f(x) on F c . By Proposition 

5.8 and linearity we get 

\\c P f\\ w < EEh^/iiwd <E 2Wp ^+ lo g L )^ 2nL ) 5(p) n/ii^ Ll/p ii/ii P - 

n k n 

Thus, for each A > 1 

\{x E [0, 1] : |C7 P /(x)| > A}| < L (MkY + e -L m 



A 



By optimizing L we get that that for each s < p 

|{xg[0,1]:|C7p/(x)|>A}|< 



A 

A simple integration argument finishes the proof of the theorem. 



6. Combining mass and size: The Lacey-Thiele argument 

The Fourier version of this argument is due to Lacey and Thiele [13]. It uses both 
mass and size and thus it combines elements of the two proofs we have seen in earlier 
sections. The interplay between mass and size made this approach particularly well suited 
for applications to the problem of singular integrals along vector fields [15], [1], [2], since 
it opened the door to the use of Kakeya type maximal functions. 

The definition of size(P) will remain the same as in Section 4. Let F be a finite measure 
subset of R + . We modify slightly the definition of mass. 
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Definition 6.1. The mass of a convex collection P of bitiles relative to F is defined as 

\E F (P)\ 

massp(P) = sup — — - — 
PeP \Ip\ 

where now E F (P) = F n I P H iV _1 (w P ). 

We have the following version of Proposition 5.3. 

Proposition 6.2 (Mass decomposition). Let P C P a u be a finite convex collection of 
bitiles. Then 

m>0 

sfic/i that 

■ mass F (P^) < 2- m : 

■ P* m is a convex forest with trees T G satisfying 

E l^|<2 m |F|, 

• C P / = /or each P G P* null and each f : R + ->■ C. 

Proposition 6.3 (Tree estimate). Le£ T be a convex tree. Then for each f G L 2 (R + ) 

I(Ct/,1p)| < |/T|massp(T)siz e/ (T). 

Proof For J G Jr define as before Gj = J H (|J PeT:Jc/p Ep(P)). Recall that Cp/lp 
restricted to J is supported on Gj. We then estimate like in the proof of Proposition 5.5 

\(C T f,l F )\ < E / l^/l ^ m assp(T) E / M(F T (/)) < 
JeJ T Gj JeJ T 3 

massp(T) E \A lj \ [ \M{V T {f))\ 2 f/ 2 < massp(T)|/ T | 1 / 2 ||\/ r (/)|| 2 , 

J£J T 3 

where Vrf = rip/ if T is overlapping and Vp = Op(ITp/) is T is lacunary. The result 
now follows from (9) and Lemma 3.2. ■ 

Proof [of Theorem 2.6] By using restricted type interpolation and a limiting argument, 
it suffices to prove that given 1 < p < oo,p ^ 2 with dual exponent p' and given finite 
measure subsets E, G of R+, there exists F C G with |F| > \G\/2 such that 

|(C P /,lp)| < I^I^IFI 1 ^ (16) 

for each |/| < 1 E and each finite convex P C P a u- 

We analyze first the case p > 2, when we can take F = G. Note that size/(P) < 1. 
Apply to P Propositions 4.6 and 6.2 to get P n , P^, T n , for m > and 2~™ < 1. Define 
P n ,m := P n n V* m . We organize the bitiles in P„ im into a forest in two different way. Call 
T n , m the trees T nP* m with T G J" n and call J 7 * ^ the trees T n P n with T G J 7 ^. Note 
that the resulting trees are convex and 

E l J H<2 2 "|£| 

. m 
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E i^i<2™m 

* n,m 

Thus, using Proposition 6.3 with the two partitions above, we get 

\(C Pn , m f,l F )\ <2-™2- m min(2 2 "|E|,2 m |F|). 

We conclude that 

|<C p /,1f)| < E KCp^/'MI < E 2 ^ 2 " m ( 22 1^l) 1/P (2 m |F|) 1/p ' < I^I^IFI 1 ^. 

2- n <l,m>0 ",m 

Consider now the case p < 2. We define 

F:=G\{x:M(l s )( a ;)>10^}. 

Note that |F| > \G\/2. Call P* = {P e P : I P £ F c } and note that (C P f,l F ) = 
(Cp*f, If}- Proposition 4.3 implies the bound size/(P*) < j^J. 

Proceeding as before, with the two partitions this time for P*, we obtain 

|<CW,1f>|< E \(C Fn ,J\iF)\< 

2" n 2" m (2 2n | J B|) 1/p '(2 m |F|) 1/p < I^I^IFI 1 ^'. 

2-"<|||,m>0 



7. A DIRECT PROOF OF STRONG L p BOUNDS 

All three arguments presented earlier rely on proving weak type bounds and using 
interpolation. Moreover, as observed in Section 5, Fefferman's argument does not imply 
strong type L p bounds for the operator Cp [0 1} when p > 2. The recent argument of Lie 
[18] proves that 

l|Cp Pil ,/|| P < H/IIlpm (17) 

directly, without any interpolation that restricts / to special classes of functions (such as 
sub-characteristic). The case p = 2 will require no interpolation in the argument, while 
the case p ^ 2 will rely on vector valued interpolation for various operators relevant to 
the proof. We will restrict attention to p > 2. 

This argument, while also interesting in itself, has been developed in [18] in order to 
solve a conjecture of Stein on the boundedness of the quadratic Carleson operator. 

We will keep the notation for mass from Section 5 and will restrict attention to the bitiles 
in P[o,i]- Recall (see Lemma 5.7) that Fefferman's approach consisted of decomposing 
every forest into a small number of Fefferman forests, outside a small exceptional set. The 
lemma below shows how to iterate the Fefferman trick inside the exceptional set, until all 
bitiles are exhausted. Each stage of the iteration creates more layers of Fefferman forests, 
but the size of their spatial support will get exponentially smaller. 
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Lemma 7.1. Let P C P[o,i] be a finite convex collection of bitiles. Then we have the 
following partitions 



p = U p * ( 18 ) 

fc>0 

P k = |J QT (19) 



m>l 

Qr= U ^r n (20) 

l<n<fc 

where each J 7 ™'™ is a Fefferman forest such that 

\\N r ^\\ OQ <k2 k (21) 

and 

mass(P \ [J Pi) < 2 l ~ k . (22) 

Moreover, there are sets (£ , ™) m , ) fc> which are finite disjoint unions J™ of dyadic intervals 
such that 

E™ +1 C E™ and \E™ +1 n J| < e- 10fe | J|, for JG J™ (23) 
P G Q™ +1 imp/ies I P CE™,I P <£ E r k n+1 (24) 

P G Q^ +1 impZies |/ P n E™ +2 \ < e- 10k \I P \ (25) 

Proof Set P := 0. Assume that given k > 0, each P, has been defined for each 
< i < k - 1 so that it satisfies (19) - (25). 

Let us see how to define P&. For the rest of the argument set P s t oc k — P\ (Uo<i<fc-iP-;)- 
Let P° ktmax be the maximal bitiles in the set 

{P G P stocfc : mass(P) > 2- fc }. 

Define E° k := [0,1], 

P° = {P G P stocfc : P < P' for some P' G P° >maj J 

PeP? 

k,max 

and the exceptional set 

El = {x : iV fc °(:r) > C^2 fe } 
where Ci is a large enough constant to be determined later. Note that (22) holds and 
hence ||-/V°|| BMOa < 2 1 ~ k . This and John-Nirenbeg in turn imply (23) for m = 0. Define 

Ql = {P G P° k : I P t E\} 

and note that (24) holds for m = 0. 

We continue the construction of the sets E™, Q™ inductively. Fix m > 1. Assume 
P°, . . . E™° and Q*, . . . , Q™° have been constructed so that (23)- (24) hold for each < 
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m < m — 1 and so that (25) holds for each m < m — 2. Define P^max t° be the maximal 
bitiles in the set 

{P G P stock : Ip C E%°, mass(P) > 2~ k }. 

Let also 

Pr = {Pe P stock :P<P> for some P' G P^J, 

= £ m*) 

Pep? 

te ,max 

and the exceptional set 

£™ 0+1 = { x : JV^°(a:) > d^}. 

Finally, define 

grno+l = { p e pm . /p £ £™0 + l|. 

Note that (23)-(24) hold when m = mo, for the same reasons as before. To check (25) 
for m = m - 1, let P G Q™°- Since I p <f_ E™°, we have that either I P n P^ = 0> or 
Ip(~)E™ is a finite disjoint union of dyadic intervals J G J™°. Note that John-Nirenberg's 
inequality guarantees that for each J G J™° we have 

| Jn £m ()+ l| < g-lOfcjj^ 

which immediately implies (25) for m = m — 1. 

It is easy to see that each Q™ is convex. It can be organized into a forest of trees with 
tops in 

Note that the counting function has a favorable L°° bound 

II £ hAoo^C^. 

p pm-l .J <f E rn 
k,max r k 

The forests J 7 ™'™ are now obtained via the Fefferman trick. 

It is immediate that the collections Q™ are pairwise disjoint for fixed k, because of 
(24). Note that the algorithm (for fixed k) will end with a finite value of m. We set Q™ 
to be empty for all larger values of m and define = U m >i Fi nan y> observe that 
when the algorithm for fixed k is over, there can not be any P left in G P 'stock \ Pfc with 
mass(P) > 2~ k . Indeed, note that I P C trivially. While I P C E™ the algorithm will 
continue to run. But by (23), there should exist m such that Ip C E™, I P <f_ E™ +1 . Thus, 
if mass(P) > 2~ fc , then P gets automatically selected in Q™ +1 . ■ 

A distinct feature of the approach in this section is the almost orthogonality between 
a function with small support and a function locally constant on this support. 

Lemma 7.2 (Almost orthogonality). Let J be a collection of pairwise disjoint intervals 
and let f : Uj e jJ — > C such that \ f\ is constant on each J G J. Assume Ej C J satisfies 
\Ej\ < a\J\ for each J G J. Then for each g : U j g jPj — > C we have 

I(/,5)I<« 1/2 II/I| 2 |M|2. 
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Proof 

\(f,g)\ < E / \f\\9\ < El^l 1/2su Pl/(^)HblUv) 
<« 1/2 E H/ll ^(,)lblU V )<« 1/2 ll/ll2lbll2 

J 

■ 

Proposition 7.3 (Almost orthogonality between forests). Fix k. For each m > 0, let 
T m denote one of the forests J-^ ,n from Lemma 7.1. Define 

iw = E 

TeF m 

Then for each w! > m + 2 

|(n^/,n w /)|<e-( m '- m - 2 ) fc / 2 ||/||2 

Proof Note that due to (21) we can split J 7 " 1 into layers J rm = \J l<k2 k ^ rm (/), so that for 
each / the trees T in J rm (l) have pairwise disjoint intervals It- 

Arguing like in the proof of Proposition 3.4 we first observe that for each x we have 
Ylre^ti) ^Tf{x) = Hpf(x), where P is the minimal bitile in Urer m (i)T such that x G Ip. 
Second, by (24), for each interval J6 J™ and each P G U T eT m (i)T, J can not contain either 
the right or the left halves of Ip. Third, (1) guarantees that each |IIp/| is constant on both 
the right and the left halves of Ip. We now conclude that the function | ^Te.F m m ^t/I 
is constant on each J G J™. Applying Lemma 7.2 to (X^TeJ rm (0 ^- T f)^ E ™ an< ^ n jm '/, by 
virtue of (23) we get 

|<rw,ri w /)| <El(n^( /,n w /)| 
i 

< fc2 * e -10(m'-m-l)fc/2||n w/ || 2 || nj:m/ || 2 < e-K-m-Wi^u^ 

■ 

Lemma 7.4 (Schur's test). Let T m be a sequence of operators on L 2 ([0, 1]) with adjoints 
T* t such that for each m, m' 

V[T m iT^2^2 — C\m'-m\ 

where 

c n = c < oo. 

n>0 

Then 

J2\\ T -f\\l< c \\f\\l 

m 

Proof We have 

E ii t ™/ii2 = E( T -' T -/' /> ^ 11 Eow-y/ih 

mm, m 

= (Y,( T ^f^' T -'f)) 1/2 \\fh ^ (E r^ii2^iiT m /ii 2 ||r m ,/ii 2 )^n/|| 2 . 

m,m' m,m' 
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It now suffices to note that this can further be bounded via Cauchy-Schwartz by 



m 



\Tmf\\\ 



Corollary 7.5. With the notation in Proposition 7.3 we have for each 2 < p < oo 

m 

Proof To get the p = 2 case apply Schur's test to the families of operators T rn = Tl^im+i 
with i G {0, 1, 2}. Note that U* T f = U T (f) for each forest T . 

We next get an L°° bound. Since by (21) each forest consists of 0(k2 k ) layers, Propo- 
sition 3.4 implies the bound HrLrm/Hoo < A;2 fc 1 1 y 1 1 oo - Then we use interpolation [3] for the 
vector valued operator Vf = (n_ 7 rm/) m > 1 like in the proof of Proposition 5.8. ■ 

Lemma 7.6. With the notation in Proposition 7.3 we have for each 2 < p < oo 

£iicwii£<2^imi£. 

m 

Proof Using that J 7 " 1 is a Fefferman forest and the tree estimate Proposition 5.5 we get 

ll<WII?= E \\c T f\\ p p < 2- fe E H n ^llp 

Consider the vector valued operator Vf = (n r /) TeUm jrm. Note that 

ll^/IU 2 (P) = (Ell n ^/ll2) 1/2 ^ll/H2> 

m 

by Corollary 7.5, and also that 



by virtue of Proposition 3.4. Interpolation [3] finishes the proof. ■ 

The proof of (17) when p > 2 will now immediately follow from the next estimate, by 
invoking the triangle inequality. 

Proposition 7.7. Let 2 < p < oo. For each f e L p ([0, 1]) we have 



|CpJ|| p <2 



-Lk 



p- 



for some L = L{p) > independent of k. 

Proof Fix k. We can assume p is an integer and then use interpolation. The value of L 
will change throughout the argument. For each m > 0, let J 7 " 1 denote one of the forests 
J 7 ™'™ from Lemma 7.1. By the triangle inequality, it will suffice to prove that 



Ell C ^Hp+ E J\Cj^f\f[\Cj~Hf\<2 

m m 1 <m 2 <...<m p j=2 



-Lfc|| f np 
V 



Note that the first term is taken care of by Lemma 7.6. We next focus on the second term. 
The restriction p 2 + m x < m p in the summation can be achieved by splitting the integers 
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in classes of residues modulo p 2 and using the triangle inequality. This separation will be 
used to achieve extra decay. 

For each T e J 7 ™ 1 let as before Jt be the collection of all maximal dyadic intervals 
J C It such that J contains no Ip with PgT. Denote by St the support 

UilpHN-^up)) 
Per 

of C T /. Recall that the sets S T are pairwise disjoint. 

Since for each P e T we have I P C P^™ 1-1 ' ( 23 )-( 25 ) wil1 im p!y that 

|/ P n (suppcy mp /)| < e- (rn "- mi - 1)fc |/p|. 

Since the dyadic parent of J must equal I P for some P G T, we conclude that also 

|Jn (suppC^/)! < e -(^-^i-i)fe|j|. (26) 



Thus 



E /i^i/inicwi< E E E f \CTf\f[\cj~Hf\ 



m 1 <m 2 <...<m p 1£J-"°1 J£J T 



E EE/ icv/iniwwi 



p TeJ 71 ™! JeJ T Jn ( su PP c T m pf) i=2 



By invoking (13), (26) Lemma 3.2 and Holder's inequality this can further be bounded 

by ' 

E E E l^n(suppC'^p/)| 1 / p infM(VT/)( a ;)n( /" \ls T C^f\ p ) 1/p < 
^i<m 2 <...<m p Te^i JeJ T " z=2 J J 



E e ~ (mp ~ mi ~ 1)fc/p E E ( / [ M (^/)] p ) 1/p fl( / iis T cwn 1/p < 

m 1 <m 2 <...<m p TSJ 7 ™! JgJ T «=2 J 

E e-^— 1 )^ e nnT/n P n(/ii5 T ^/n 1/p < 

„ <r <r ™ „ T/- rmi „■ — o ^ 



"ii<m 2 <...<m p TeT m i i=2 



E e -^- rai -W( E linT/ll^^flll^/ll^ 

m 1 <m 2 <...<m p Te.F m l 1=2 

m l +P < m p 

In the last line we have used the orthogonality of U T f and the disjointness of St- 

Using (15), the forest estimate in Proposition 5.8, Holder's inequality and the fact that 

we upgrade the last estimate to 

E / icwi fl i<wi $ E e ~ {mp ~ mi ~ 1)k/p Tl lin^/llp 

m l < m 2 — ■■ -< m p i=2 mi <m2 <■ ..<m p 2 = 1 



A GUIDE TO CARLESON'S THEOREM 23 

<e-^||IW||£. 

m 

The Proposition will now follow from Corollary 7.5. ■ 
8. A PROOF WITHOUT ANY APPEAL TO THE CHOICE FUNCTION 

Let ip,(f) be Schwartz functions supported on T = [—1/2,1/2) such that ftp — 1, 
J 4> = 0. Define the kernel 

K(t, s ) = J2Mt)M$) 

k<0 

where ipk{t) = 2~ k ip(t2~ k ) and <f>k(s) = 2~ k (f){s2~ k ). Consider the trilinear form 

A(Fi,F 2 ,F 3 )= / F 1 (x + t,y + s)F 2 (x + s,y)F 3 (x,y)K(t,s)dxdydtds, 
</t 4 

where + denotes the summation modulo one on the torus. This is an example of a dualized 
two dimensional bilinear Hilbert transform. 

Motivated in part by questions from Ergodic Theory, in [8] it is proved that 

\A(F U F 2 ,F 3 )\< H^IUI^IUlFalU, (27) 

whenever 1/pi + l/p2 + 1/^3 = 1 and 2 < pi < oo. 

Interestingly, this implies the L p ,p > 2 boundedness of the Carleson operator defined 

as 

Cf(x) = sup \ [ f( x + s)e iNs -\, 

and in particular, the almost everywhere convergence of the Fourier series S n f(x) in 
the p > 2 regime. To see this, it suffices to apply (27) to Fi(x,y) = f(y), F 2 (x,y) = 
e ixN ^g(y), F 3 (x,y) = e~ lxN ^h(y) with \\g\\ P2 = \\h\\ P3 = 1 and to an appropriate such 
that E k <oMs) = l/s for s G [-1/4, 1/4] \ {0}. 

We will prove the Walsh model analogue of (27) for the dyadic kernel 

K w (t, 8 ) = Y,2*l Ik (t)h Ih (8), 

k<0 

where Ik = [0, 2~ k ] and hi k is the L 2 normalized Haar function. Define now 

A W (F 1 , F 2 , F 3 ) = [ F 1 (x®t,y®s)F 2 (x®s,y)F 3 (x,y)K w (t,s)dxdydtds. 
J [OA] 4 

Theorem 8.1. 

\A W (F 1 ,F 2 ,F 3 )\ < ||-Pi||pi||-p2||p2ll-p3||p 3 ) 
whenever \ jp\ + l/p 2 + l/p3 = 1 and 2 < pi < oo. 

The definition of A(F 1; F 2 , F 3 ) involves no choice function N(x), so we recover a proof 
of the boundedness of the Carleson operator which makes no mention of it! The proof of 
Theorem 8.1 is very close in spirit to the proof of the boundedness of the Walsh model of 
the bilinear Hilbert transform [14]. This shows once more the deep connection between 
these two operators. 

Define the projection operators 

nlF(x, y) = ^[UO-M^Xe, V)](x, y) 
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tiF(x,y) = J^[l u (ri)F w (F)(Z,ri)](x,y). 
It can be easily checked that 

A W (F U F 2 ,F 3 ) = I nlMA F i( x MM x >y)*l.Hx,v)dxdy+ 
J\o,i] 2 1 ' 2 J 



I ' 1 !" ujeV + :\uj\>2 



E n lo M{ K lu F ^ x ^y)^li F ^ x ^y)^li F ^ x ^y) dxd v^ 

[ ,1]2 ^ .. . 2 



weX>+:|o;|>2 

where ooi,u u denote the left and right halves of u. 

By symmetry, it will suffice to estimate the first integral, which we denote by A W,1 {F\ 1 F 2 , F 3 ). 
Let be a collection of intervals in V + . Define 

An(Fi,F 2 ,F 3 ) = / YV - M nl F 1 {x,y)^l u F 2 {x,y)^l u F 3 {x,y)dxdy. 

The first case of interest is when the intervals in Q are nested. This will be a precursor 
of the tree estimate in Lemma 8.6. 

Proposition 8.2. Let Q be a collection of intervals in T> + which contain a pointy. Then 

|An(Fi, F 2 , F 3 )\ < ||-Pi|| Pl ||-p2||p2 11-^3 ||p 3 ) 
whenever l/p 1 + l/p 2 + l/p 3 = 1 and 1 < pi < oo. 

Proof By splitting Q in two parts, it suffices to assume that either £ G for all w 
or ( 6 w„ for all a;. In the first case, note that the intervals u u are pairwise disjoint. 
We estimate using Holder's inequality and the boundedness of the square function S 
associated with the intervals ui u on the first component 

\A n (F 1 ,F 2 ,F 3 )\< 

ll^(-^l)llpill'S'^2||p 2 ||5'i 7 3||p3 ~ II -^lllpi 11-^2 ||p 2 \\F?, ||p 3 . 

In the second case note that w« are pairwise disjoint. We will run a standard telescop- 
ing argument. Denote by Q the collection of all set differences between intervals u u of 
consecutive length. Note that each oj = u' u \ u u G Q is the union of at most two intervals 
whose length is at least \u u \. We call u ext the complement (in R + ) of the largest interval 
u u , u G f2, and Ff xt = 7r^ ea!t Fj for « = 2, 3. Note that trivially, for each w G f2 

n l u F i = F i Xt ~ E ^ ^ = 2 ' 3 - 

We can now write 

A n (Fi,F 2 ,F 3 ) = 



/ 



2 E ^M^i^- v) F 2 Xt ( x , y)F 3 ext (x, y)dxdy - (28) 
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/ 2 E E ^ MflM^y) F 2 x ^^vHH^y)dxdy- (29) 



E E < M 7riF 1 (x,y)7rlF 2 (x,y)iJr'(x,y)dxdy+ (30) 
ro il 2 - — 1 2 1 

1°'^ wen £ e n : £>£w„ 

+ / E E E i Mi^/l^- ^^^(X, t/jTTj/s^, 

/[n 112 — ~ _ •* — ~ l u > 2 J /qi\ 

L ' J wGS2(i e n : Cj<£u u cj'efi: uYw« 

The term (28) is controlled by Holder's inequality and the boundedness of the two di- 
mensional singular integral operator TF = XLen 71-1 M ^t^. To control (31) we note 

that the term corresponding to a triple 00,00,00' is nonzero only when 00,00' are adjacent 
to each other. This is simple computation involving the Walsh transform that will diag- 
onalize the sum. Then use Cauchy-Schwartz and the boundedness of the square function 
(Siien Wh^l 2 ) 1 ^ 2 an d of the maximal truncation operator 

A similar argument will take care of the terms (29) and (30), the details are left to the 
reader. 

■ 

The next step is to discretize the form A in both time and frequency. 



Definition 8.3. A one and a half dimensional bitile (tile) P = Rp x up, for short | - 
bitile ( tile ), is a product of a dyadic square Rp = Ip x Jp G T>\ and a dyadic interval 
up G T> + with \oop\ = 2\Ip\~ 1 (\oop\ = \Ip\~ 1 ). If P is a | - bitile, we define as before the 
upper and lower | - tiles P u = Rp x up u , Pi = Rp x oop r 

Call P 3 Jif the entire collection of | - bitiles. There is a natural partial order on ~P%f 
which, by abusing earlier notation will be denoted by <. The reader can correctly an- 
ticipate that P < P' will stand for R P C Rp> and 00 p> C 00 p. Convexity on P^ 2 will be 
understood with respect to this order. 

>2 l„ „ J„,„JA„ „^,A ln+ C r- 1D> A 3 



Definition 8.4. Let R T C T> + be a dyadic square and let & £ A | - tree T top 

,3/2 



o?a£a (Rt,£,t) is a collection of I - bitiles in P^ 2 such that Rp C Pt <w<^ £r e w p / or 



eac/i P G T. 

With each | - bitile P we will identify two regions in the phase space = x M, 2 ^. 
One is the region covered by the two dimensional bitile Rpx[0,\up\}xup. The other one 
is Rp x up x R + , which is the union of two dimensional bitiles of the form Rp x up x u. 
We will denote by Cp the collection of all these two dimensional bitiles. It is easy to check 
that if P C P^ 2 is convex then both collections of bitiles {RpX [0, \up\] xup : P G P} and 
Up e pCp are convex with respect to the two dimensional order. We will denote by IlpP 
and Ilp 2 P the phase space projections associated with the two collections (cf. Remark 
2.5) 
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Note that Up 2 F can be thought of as being a one and a half dimensional projection, 
since this operator produces no localization on the second frequency component. For 
example, if p is a § - tile then 



n 3 J 2 F(x,y) 



F(x',y)W IpX0Jp (x')dx' 



W IpX0Jp {x)l Jp (y) 



= ( F ^ w i P ^ P w J P ^) w i P ^ P ( x ) w J P ^(y)- 

|w|=|w p | 

For a convex | - tree T define 

A T (F 1 ,F 2 ,F 3 ) = V lR P (x,y)rf \. P \^l p F l (x,y)ixl p F 2 (x,y)irl p F 3 (x,y)dxdy, 
J{o,i]i peT 2 

and observe that variants of (6) will imply that 

A T (F U F 2 , F 3 ) = A T (n|(Fi),n^ /2 (F 2 ), nf (F 3 )) 

An argument very similar to the one in Proposition 8.2 will prove 

I At (Fi, F 2 , F 3 )\ < || F 1 || S1 1| F 2 \\ S2 \\ F 3 || S3 , 
whenever l/s 1 + l/s 2 + l/s 3 = 1 and 1 < Sj < oo. Combining these we get 

\A T (FuF 2 ,F a )\ < lin^FOLJinf (F 2 )|| S2 ||nf(F 3 )|| S3 (32) 

Definition 8.5. Let P C P^ 2 be a collection of § - Utiles. For F : — >■ C de/ine 

• ^ lin^lb 

size f,2(" J = sup 



Rp\ 1/2 



PGP l-n-P 

size Fi3/2 (P) = sup , ■ 

PeP 1-npp 

Lemma 8.6 (Tree estimate). Let T C P^ 2 fre a convex | - iree. T/ien 

Pt^IIp < | J RT| 1/p size J p i2 (T), /or eac/j 1 < p < oo. 

If in addition Halloo < 1 then 

W^THp £ |i?T| 1/p (size Fi3/2 (T)) 2/p , for each 2 < p < oo. 

Proof The crucial observation in both cases is that 

ID r F(x,y) = n i P F(x,y), (33) 

for each % G {§,2}, where P is minimal | - bitile in T such that (x, y) G -Rp. This 
follows like in the proof of Proposition 3.4. The first inequality then follows by noting 
that UnpFHoo < lin^FHa, like in the one dimensional case. 

Let us now analyze the second inequality. Let P T be the collection of all | - bitiles in 
T such that for each P G Pt there is (x, y) such that P is the minimal § - bitile in T 
with (x,y) G Rp. It is easy to see that 

E W £ (34) 
PeP T 
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Indeed, if P e Pt, by the convexity of T at least one of the four dyadic children of R P 
will contain no Rp> with P' e Pr- 

Next we observe that for each | - bitile P, 

linfFiu < Halloo, (35) 

which together with Holder's inequality gives, if ||P||<x> < 1 an d 2 < p < oo 

||n P /2 F|| p < |i2p| 1 /p(size F>3 /2(P)) 2/l '. 
The result now follows by combining this with (33) and (34). ■ 

The proof of Proposition 4.6 applies with essentially no modification to prove the fol- 
lowing variant 

Proposition 8.7 (Size decomposition). Let i be either 2 or 3/2. Let F : — > C. Let 

P be a finite convex collection of | - bitiles. 
Then 

P = [J Pn U P nn //, 

2-™<sizc F>i (P) 

such that 

■ size F)i (P n ) < 2" 

fnrpat iivt+h COnVeX 2 

^ |i? T |<2 2 "||F||i 



P n is a convex forest with convex § - trees T e .F n satisfying 



TeT n 

■ W P F = /or eac/i P G P nn „ 
Proof [of Theorem 8.1] By multilinear restricted type interpolation it suffices to prove 



1/P3 



whenever |Pj| < 1^ for finite measure subsets Ei of Fix 2 < j9j < oo for the rest of 
the proof. We begin by observing that 

A w ' 1 (F l ,F 2 ,F 3 )= A P (P 1 ,P 2 ,P 3 ), 

and thus it further suffices to prove 

|^A P (P 1 ,P 2 ,P 3 )| < |Pir /pi |P 2 | 1/p2 |P3| 1/p3 , 
PeP 

for all convex, finite P C P^ 2 - 

Note that by (35) and the natural two dimensional extension of Proposition 4.3, we 
have 

size Fl)2 (P),size F2i 3/ 2 (P),size F3)3 / 2 (P) < 1. 

Let Pn , Pn , Pn be the collections provided by Proposition 8.7, relative to sizep lj2 (P), 
size F2)3 / 2 (P) and sizep 3i3 / 2 (P), respectively. Define 

Pni,n2,n3 Pni npgnpg. Note 
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that 

^Ap(F 1) F 2 ,F 3 )= E A P (F 1 ,F 2 ,F 3 ). (36) 

PGP 2- n l,2- n 2,2- n 3<l P€P ni ,„ 2 ,n 3 

Reasoning like in the proof of Theorem 2.6 from Section 6, we organize P ni ,n 2 ,n 3 as a forest 
in three different ways, with the L l norm of the counting function of the tops bounded 
by 2 2ni \E 1 \, 2 2n *\E 2 \ and 2 2n3 |£ 3 |, respectively 

Pick now 1 > a > max{^, ^}. Using (36), (32) with s 2 = s 3 = 2/a and then Lemma 
8.6 we get 

| 5^A P (Fi,F 2 ,F 3 )| = Yl 2' ni ~ n2a - n3a mm(2 2ni \E 1 \,2 2n2 \E 2 \,2 2n3 \E 3 \) < 

pgp 2- n i,2~ n 2,2~ n 3<l 

2~ rai _n 2 a - ra 3a (2 2 ™ 1 I Ei I ) 1 ^ Pl (2 2 " 2 1 -E 2 1 ) 1 ^ P2 (2 2n3 1 -E 3 1 ) 1 ^ P3 

2~ n i,2~ n 2,2~ n 3<l 

< |£i| 1/pi |£ 2 | 1/p2 |£ 3 | 1/p3 . 
The argument is now complete. ■ 
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